Search CORE

14 research outputs found

Analyzing establishment nonresponse using an interpretable regression tree model with linked administrative data

Author: Phipps Polly
Toth Daniell
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/06/2012
Field of study

To gain insight into how characteristics of an establishment are associated with nonresponse, a recursive partitioning algorithm is applied to the Occupational Employment Statistics May 2006 survey data to build a regression tree. The tree models an establishment's propensity to respond to the survey given certain establishment characteristics. It provides mutually exclusive cells based on the characteristics with homogeneous response propensities. This makes it easy to identify interpretable associations between the characteristic variables and an establishment's propensity to respond, something not easily done using a logistic regression propensity model. We test the model obtained using the May data against data from the November 2006 Occupational Employment Statistics survey. Testing the model on a disjoint set of establishment data with a very large sample size

(n=179,360)

offers evidence that the regression tree model accurately describes the association between the establishment characteristics and the response propensity for the OES survey. The accuracy of this modeling approach is compared to that of logistic regression through simulation. This representation is then used along with frame-level administrative wage data linked to sample data to investigate the possibility of nonresponse bias. We show that without proper adjustments the nonresponse does pose a risk of bias and is possibly nonignorable.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS521 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Bayesian Estimation Under Informative Sampling

Author: Savitsky Terrance D.
Toth Daniell
Publication venue
Publication date: 01/01/2016
Field of study

Bayesian analysis is increasingly popular for use in social science and other application areas where the data are observations from an informative sample. An informative sampling design leads to inclusion probabilities that are correlated with the response variable of interest. Model inference performed on the observed sample taken from the population will be biased for the population generative model under informative sampling since the balance of information in the sample data is different from that for the population. Typical approaches to account for an informative sampling design under Bayesian estimation are often difficult to implement because they require re-parameterization of the hypothesized generating model, or focus on design, rather than model-based, inference. We propose to construct a pseudo-posterior distribution that utilizes sampling weights based on the marginal inclusion probabilities to exponentiate the likelihood contribution of each sampled unit, which weights the information in the sample back to the population. Our approach provides a nearly automated estimation procedure applicable to any model specified by the data analyst for the population and retains the population model parameterization and posterior sampling geometry. We construct conditions on known marginal and pairwise inclusion probabilities that define a class of sampling designs where

L_{1}

consistency of the pseudo posterior is guaranteed. We demonstrate our method on an application concerning the Bureau of Labor Statistics Job Openings and Labor Turnover Survey.Comment: 24 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Adding interior points to an existing Brownian sheet lattice

Author: Toth Daniell
Publication venue
Publication date
Field of study

We compute the conditional distribution of new interior points of a given a lattice representing a path of a Brownian sheet process in discrete time. This is done so that we can simulate paths of this multi-parameter Gaussian process by refining previously simulated paths, which allows one to refine a particular area of the path that is of interest.Brownian sheet Simulation Conditional distribution

Research Papers in Economics

Bayesian Multiscale Multiple Imputation With Implications for Data Confidentiality

Author: Alan F. Karr
Daniell Toth
Keller-McNulty S.
Machanavajjhala A.
Marco A. R. Ferreira
Reiter J.
Scott H. Holan
Publication venue
Publication date
Field of study

Crossref

Research Papers in Economics